Implementation and Evaluation of Parallel Data Mining on PC Cluster and Optimization of its Execution Environments

نویسندگان

  • Masato Oguchi
  • Masaru Kitsuregawa
چکیده

Personal Computer/Workstation clusters have been studied intensively in the field of parallel and distributed computing. In the viewpoint of applications, data intensive applications such as data mining and ad-hoc query processing in databases are considered very important for high performance computing, as well as conventional scientific calculations. We have built and evaluated PC cluster pilot systems, especially SAN-connected PC cluster, and implemented parallel data mining on them. Several optimization, including dynamic data allocation, is discussed for the execution of this application. Keywords— PC cluster, Data Mining, Storage Area Network, Optimization, Dynamic data allocation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallel Data Mining on ATM-Connected PC Cluster and Optimization of Its Execution Environments

In this paper, we have constructed a large scale ATM-connected PC cluster consists of 100 PCs, implemented a data mining application, and optimized its execution environment. Default parameters of TCP retransmission mechanism cannot provide good performance for data mining application, since a lot of collisions occur in the case of all-to-all multicasting in the large scale PC cluster. Using a ...

متن کامل

Optimizing Protocol Parameters to Large Scale PC Cluster and Evaluation of its Effectiveness with Parallel Data Mining

Recently, PC clusters have come to be studied intensively, for a large scale parallel computer in the next generation. ATM technology is a strong candidate as a de facto standard of high speed communication networks. Therefore an ATM connected PC cluster is very promising platform from the cost/performance point of view, as a future high performance computing environment. In this paper, an ATM ...

متن کامل

Optimizing the parSOM Neural Network Implementation for Data Mining with Distributed Memory Systems and Cluster Computing

The self-organizing map is a prominent unsupervised neural network model which lends itself to the analysis of high-dimensional input data and data mining applications. However, the high execution times required to train the map put a limit to its application in many high-performance data analysis application domains. In this paper we discuss the parSOM implementation, a software-based parallel...

متن کامل

Towards the Optimization of Data Mining Execution Process in Distributed Environments

The distribution and heterogeneity of data resources in loosely coupled distributed environments bring the challenges to data mining applications, because the parallelism and collaboration of nodes should be considered during the optimization and execution. Data mining algorithms can be represented as data mining execution processes which are composed of finer grained operators, and optimizing ...

متن کامل

Data mining on PC cluster connected with storage area network: its preliminary experimental results

Personal computer/Workstation (PC/WS) clusters have become a hot research topic recently in the field of parallel and distributed computing. They are considered to play an important role as a large scale computer system, such as large server sites and/or high performance parallel computers, because of their good scalability and cost performance ratio. In the viewpoint of applications, data inte...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001